I 


NEW  YORK  UNIVERSITY 

INSTITUTE  OF  MATHEMATICAL  SCIENCES 

LIBR-\RY 

4  Washington  Place,  New  York  3,  N.  Y 


IMM-NYU    275 
OCTOBER    I960 


NEW  YORK  UNIVERSITY 
INSTITUTE   OF 
MATHEMATICAL  SCIENCES 


On  The  Foundations  of  Statistical  Inference.  11 


ALLAN  BIRNBAUM 


PREPARED  UNDER 
CONTRACT  NO.  NONR-285(38) 
WITH  THE 

OFFICE  OF  NAVAL  RESEARCH 
UNITED  STATES  NAVY 


REPRODUCTION  IN  WHOUE  OR  IN  PART 

,S  T-EilMlTTED  FOR  ANY  PURPOSE 
OF  THE  UNITED  STATES  GOVERNMENT. 


'^ST,TUTE  OF  MATHEMATICAL  SCENGEs 

LIBRARY  IMM-NYU  275 

•1  w.sh,„g,o„  Place,  New  YoH.  J,  K  y  October  i960 

New  York  University 
Institute  of  Mathematical  Sciences 


ON  THE  FOUNDATIONS  OF  STATISTICAL  INFERENCE.   II 


Allan  Blrnbaum 


This  report  represents  results  obtained  at  the 
Institute  of  Mathematical  Sciences,  New  York 
University,  under  the  sponsorship  of  the  Office 
of  Naval  Research,  Contract  No.  Nonr-285( 38) . 

i960 


.'.     ■  .'     V 


03 


jl/!  ,.i-.tH  ■>  i-.<m''' '•^-' 


CONTENTS 

Fap-e 

0.  Introduction  and  summary,  0 
Part  A:  k  simple  hypotheses, 

1.  The  canonical  form  of  an  experiment,  1 

2.  Some  algebra  of  statistical  experiments,  5 

3.  The  partial  ordering  of  simple  cyclic-symmetric 
experiments,  8 

i|.   Inference  methods  \<iith   intrinsic  justifications,  11 

5»   Intrinsic  confidence  methods,  l5 

6,   An  interpretation  of  the  "principle  of  insufficient 

reason,"  19 

7»     An  interpretation  of  Fisher's  ''fiducial  argument,"  22 

8,  The  relativity  of  intrinsic  interpretations  expressed 

in  terms  of  error-probabilities,  27 

Part  B:  Translation  and  scale  parameters, 

9.  Conditional  inference  methods,  29 

10,  Intrinsic  confidence  methods,  30 

11,  Discussion,  32 

12,  Acknowledgements,  3ij. 


'J^> 


.Bi'ii'ifjs    t  !";>.•   r.'vl  7v;. 


0 

0.   Introduction  and  summary.   Some  principal  technical 

developments  of  Part  I  of  this  paper  are  derived  here  in  more 
elementary  fashion,  under  th^^  restriction  to  statistical  ex- 
periments with  discrete  sample  spaces,  but  under  the  more 
general  condition  that  any  finite  number  of  (simple)  statistical 
hypotheses  may  be  represented.   For  any  such  experiment,  it  is 
shown  that  for  typical  purposes  of  informative  statistical 
inference,  just  the  likelihood  function  on  an  observed  outcome 
can  and  should  be  reported  and  interpreted  to  provide  inferences 
of  general  interest  concerning  the  statistical  hypotheses  (or 
unknown  parameter  values);  and  that  for  such  purposes,  the 
structure  of  the  experiment  from  which  an  outcome  was  obtained 
is  Irrelevant,  apart  from  determination  of  the  likelihood 
function.   Specific  techniques  for  interpretation  of  likelihood 
functions  are  developed,  particularly  "intrinsic  confidence 
methods"  which  constitute  an  appropriate  generalization  and 
refinement  of  confidence  methods  and  conditional  confidence 
methods.   The  relations  of  such  methods  to  traditional  methods 
based  on  the  "  principle  of  insufficient  reason",  are  discussed, 
as  to  form  and  interpretation.  In  Sections  9-11,  analogous 
developments  are  given  for  experiments  involving  translation 
or  scale  parameters. 
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Part  A:  k  simple  hypotheses,  1 

1 .  The  canonical  form  of  an  experiment.  We  consider  a 
given  experiment  E,  assuming  that  questions  of  experimental 
design,  including  those  of  choice  of  a  sample  size  or  possibly 
a  sequential  sampling  rule,  have  been  dealt  with,  and  that  the 
sample  space  of  possible  outcomes  x  of  E  is  a  specified  set 
S  =[x  \.      We  assume  that  each  of  the  possible  distributions  of  X 
is  represented  by  a  specified  elementary  probability  function 
f.(x):  if  the  hypothesis  H.  is  true,  the  probability  that  E 
yields  an  outcome  x  in  A  is 


P.  (A)  =  Jf^{x)d\i{x)    , 


A 

where  fx  is  a  specified  ^-finite  measure  on  S,  and  A  is  any 
measurable  set. 

We  assume  here  that  a  finite  number  k  of  hypotheses  are 
under  consideration:  H, , . . .H,  ,  k  ^  2.  We  shall  omit  comments 
on  the  particular  features  of  the  case  of  binary  experiments, 
k  =  2,  which  were  discussed  in  Part  I;  and  we  shall  refer  to 
Part  I  at  some  points  where  methods  or  interpretations  are 
immediately  applicable  v/ithout  complication  to  experiments 
with  k  >  2. 

For  any  binary  experiment  E,  let 


r 


^.  =  r^Ax)   =  log[f^(x)/f  .(x)],   i,j  =  1,  ...  k,  i  7^  J 


Let 


r  =  r(x)  =  [r^^ix],...    ^^A^)  >    ^23^^^ '  *  *  •■^2k^^^ '  *  * '^(k-l)  ,k^^^  ^ 
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It  is  well-known  that  r  is  a  sufficient  statistic,  which  may 
or  may  not  be  minimal  sufficient,  depending  upon  the  structure 
of  E.  (If  f.(x)  =  f.(x)  =  0  or  00  ,  we  define  r.  .(x)  =  0.  The 
statistic  r  contains  components  which  are  redundant  in  many 
experiments;  for  example,  if  0  "^  foC^)  "^   oo  for  all  x,    then 
for  all  X  we  have  r, ^(x)  =  r, pCx)  +  rp^(x) .   However  it  is 
convenient  to  tolerate  such  possible  redundancies  for  purposes 
of  general  discussion,  and  to  take  account  of  them  appropriately 
for  more  specific  purposes.) 
Let 

P^(r)  =  Prob  [r(X)  <  r|  H^],   1  =  l,...k, 

where  the  Inequality  between  vectors  denotes  the  corresponding 
inequality  between  respective  coordinates:  r.  .(x)  ^  ^i-i*   ^^ 
general,  F-(r)  is  a  generalized  multivariate  distribution 
function,  and  r(X)  is  a  generalized  multivariate  random 
variable,  in  the  sense  that  some  coordinates  of  the  latter  may 
assume  infinite  values  with  positive  probability  under  some 
hypotheses.   The  set  of  k  distribution  functions  F.  of  the 
statistic  r  may  be  taken  as  a  canonical  form  of  any  experiment. 

For  some  purposes,  a  different  canonical  representation 
of  an  experiment  may  be  more  convenient.   For  example,  let 
k  =  3,  let  g  =  (gn ^gojg^)  be  any  set  of  formal  prior  probabili- 
ties for  the  respective  hypotheses.   Let  g*(x)  denote  the 
formal  posterior  probability  of  H. ,  given  that  X  =  x,  for  each 
1  and  X.   Let  d(x)  be  any  (measurable)  function  such  that, 
for  each  x,  the  value  of  d(x)  is  the  index  (1  =  1,2,  or  5)  of 
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the  hypothesis  H.  (or  one  of  the  hypotheses)  with  greatest 
posterior  porbability,  given  X  =  x;  that  is,  d(x)  is  a  formal 
Bayes  solution,  with  respect  to  the  prior  distribution  g.  Let 

a.  =  Prob  [d(X)  ^   iJH],  i  =  1,2,5  and  let  a  =(a,,ap,a^). 

Then  a  is  the  set  of  error-probabilities  a. ,  under  respective 
hypotheses  H. ,  of  the  inference  (or  decision)  function  d(x). 
A  basic  part  of  the  theory  of  statistical  decision  functions 
is  the  investigation,  for  various  experiments,  of  the  set  of 
all  such  points  a,  under  all  possible  choices  of  g.   It  is 
well-known  that,  for  each  g,  the  set  of  such  points  is  either 
a  single  point,  or  a  line-segment,  or  a  convex  subset  of  a  plane 
(that  is,  a  convex  set,  of  dimension  at  most  (k-1)  =  2);  and 
that,  for  all  g,  the  set  is  a  convex  surface  in  the  unit  cube. 
(The  preceding  assumes  tacitly,  without  essential  loss  of 
generality,  that  x  includes  an  observation  on  an  auxiliary 
randomization  variable.)  But 

1  -  a^  =  Prob [ rj_2 (X ).>  log (gg/g^),  r^^(X)  .?.  log(gyg^)  |H^]  , 

1  -  Qg  =  Prob[r^2(X)  .^.  log(g2/gj_) ,  r2^(X)  .>.  log(gyg2)  iHg] ,  and 

1  -  a^  =  Prob [r^^ (X ).?.  log (g^g^),  r2^(X)  ..<.  log(g^/g2)  |H^]  , 

for  any  d(x)  corresponding  to  a  given  g;  here  the  inequality 
symbols  .5-.  and  .?.  refer  to  the  arbitrary  definition  of  d(x)  on 
points  where  equality  holds.   In  experiments  for  which  the 
distributions  of  components  r.  .(X)  are  continuous,  these  equations 
define  a  unique  point  a  =  a(g)  for  each  g;  since  VyAyi)^T^J^y.)-^v^J^yi.. 
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the  distribution  P, (r)  of  r(X)  under  H,  is  represented 
directly  by  the  above  equation  for  1-a,  when  g  is  varied  over 
its  range.  Similarly  the  distributions  Pp(r)  and  F^(r)  are 
represented  by  the  equations  for  l-Qp  and  1-a^.   The  same 
interpretations  can  be  given,  with  attention  to  details,  in 
cases  of  discontinuous  distributions.  Thus  the  convex  surface 
of  points  a  is  a  canonical  form  of  an  experiment,  which  is 
convenient  for  some  purposes. 

In  the  next  sections  we  shall  for  the  most  part  consider 
experiments  with  discrete  distributions  f .  (x.)  (or,  slightly 
more  generally,  discrete  distributions  F. (r)).  For  our   purposes, 
it  will  be  convenient  to  represent  each  such  experiment  E  by 
a  stochastic  matrix: 


m 
E  =  (p^  ^) ,   1  =  1, . . .k,   j  =  1, . . .m;  \   p.  .  =  1  for  each  1. 

Here  p.  .  =  Prob  [X  =  j|H.];  the  sample  space  is  the  range  of  J; 
m  may  be  finite  or  infinite;  and  in  the  latter  case  the  range  of 
j  can  when  convenient  be  taken  to  be  the  doubly-infinite  sequence 
of  integers,  -  oo  <  J  <  oo  . 

Redundancies  in  such  representations  of  experiments  may  be 
eliminated  as  follows,  when  desired:  (A)  If  two  columns  of  such 
a  matrix  are  proportional  (p.  .  =  cp. ,  for  some  j  ^  h  and  some 

■J-  tj         ill 

c,  for  each  1),  these  columns  may  be  deleted  and  replaced  by 
the  single  column  having  elements  (p^  ^  +  P-tw  )  >  with  an  appropriate 
revision  of  the  subscripts  J.   (Since  the  probability  of  X  =  J, 
given  that  X  =  j  or  ^ ,  is  Independent  of  1,  the  "simpler" 
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experiment  Is  not  less  informative.)  If  all  such  possible 
simplifications  are  made,  j  is  a  minimal  sufficient  statistic. 
(B)  Since  a  permutation  of  columns  represents  a  re-labeling  of 
sample  points,  experiments  differing  only  in  this  respect  are 
equivalent.  (C)  When  convenient,  a  standard  manner  of  ordering 
columns  may  be  adopted. 

2*  Some  algebra  of  statistical  experiments.   Except  where 
the  contrary  is  indicated,  we  assume  that  experiments  for  some 
fixed  number  k  of  hypotheses  are  lender  consideration.  An 
experiment  E  =  (p.  .)  will  be  called  simple  if  its  matrix  has 
(after  the  simplifying  operations  described  above)  at  most  k 
col\Amnsj  that  is,  a  simple  experiment  has  not  more  sample  points 
(after  simplification)  than  hypotheses.  The  completely  informative 
experiment  is  (equivalent  to)  the  identity  matrix  of  order  k; 
the  uninformative  experiment  is  (equivalent  to)  the  single- 
colvimn  raatcix  with  elements  all  unity;  all  other  experiments 
(not  necessarily  simple)  are  called  incompletely  informative. 
Any  experiment  having  two  identical  rows  (p^^  •  =  p^^  •  for  some 
i  7^  h  and  all  j)  will  be  called  degenerate;  even  many  replications 
of  such  an  experiment  are  without  value  for  distinguishing  between 
certain  of  the  hypotheses.   An  experiment  E  is  called  at 
least  as  informative  as  an  experimeiit  E*,  or  is  "said  to 

contain  E*,  if  there  exists  a  stochastic  matrix  Q  =  (q.  .)  such 

m  ^^ 

that  E*  =  EQ;  that  is,  if  p   =  y~   p   q  .  It  is  knovm'  '  ' 

that  E  contains  E*  if  and  only  if  the  convex  hypersurface  of 

points  a,  which  constitutes  a  canonical  form  of  E  in  the  sense 

illustrated  in  the  preceding  Section,  encloses  the  convex  hyper- 
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surface  of  points  a*  corresponding  to  E*.   The  relation  "contains" 
determines  a  partial  ordering  of  all  experiments  for  k  hypotheses. 

Let  E,  ,  h  =  1,  2,    ...,   denote  any  sequence  of  experiments,  and 


let  g  =  (g, ,  goj  •••)  he  any  sequence  of  probabilities,  >    g,  =  1. 
i   d  ^^  n 

assigned  formally  to  the  respective  experiments.   Then 


E  =   /  (+^)  g^  E,  represents  "the  mixture  g  of  the  experiments  E,  "  : 

h  ~ 
the  experiment  E  consists  of  the  observation  of  a  value  h  of 

the  random  variable  H  with  distribution  g,  followed  by  use  of 

the  corresponding  experiment  E,  .   If  each  of  the  "  component" 

experiments  E,  can  be  represented  by  a  finite  matrix  (p.  .), 

then  E  is  easily  represented  (before  possible  simplification) 

by  the  matrix  (consisting  of  successive  finite  blocks) 


E  =  [(g^  P^j.),(g2  P?j).  .-.  ]  • 


Example  1.   Let  E  = 


^6   .2   .21  ^^ 

.2   .6   .21   and  let  E 
2   .6 


\-'  ■ 


1 
7 


1  3  3i 
5  13 
3  3  1 


Let  E  be  the  mixture  g  =  (5/12,  7/12)  of  these  respective 
experiments.   Then 


E  = 


12  ^     *  12  ^        -  T2 


3  11  13  3 
13  1  3  13 
113       3     3     1 


It  is  readily  verified  that  E  has  an  alternative  decomposition 
represented  by  E  =  --  E,  8  -^  Ep  if)  ^  E^,  where 


El  =  Tf 


3  1 

1  3 
1  3 


^P  =  -^ 


1  3 
3  1 
1  3 


E^  =  If 


1  3 
1  3 
3  1 


N%    :'?    :<   =^   n    ^^^,S   y^"..4 
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An  experiment  will  be  called  cyclic-symmetric  (abbreviated 
"  c.s. ")  if  it  can  be  represented  in  the  form  E  =  (p. .)  =  (A,,Ap,.. 
where  each  A,  is  a  square  cyclic-symmetric  matrix.   (A  square 
matrix  (a   )  of  order  k  is  eye lie -symmetric  if  its  elements 

satisfy  a^^  =  a^+i,v+l  ^^^  ^kv  =  ^l,v+l'  ^°^  ^^^  =  l,...(k-l).) 

Examples  are  the  experiments  E  ,  E   ,  and  E  of  Example  1  above. 

Lemma  1.   Every  experiment  E  =  (p.  J  is  a  component  of  some 

cyclic-symmetric  experiment. 

Proof;   Let  E^  =  (pj^j).  where  pj^  -  Pi-i+h,j'  '°^  ^   =  l,...k, 

where  a  subscript  exceeding  k  is  to  be  diminished  by  kjthus  E^  =  E. 

Let 


Even  if  E  is  not  finite,  it  is  possible  to  order  the  columns 

of  E  so  as  to  exhibit  its  cyclic  symmetry,  thus:   the  first  k 

columns  of  E  are  respectively  the  initial  columns  of  E  , . .E, , 

each  multiplied  by  1/k;  the  next  k  columns  of  E  are  respectively 

the  second  columns  of  these  matrices,  multiplied  by  1/k;  etc. 

Example:   Let  E  =  E,  of  the  preceding  Example  1.   Then  Ep,  E,, 

and  E  are  as  defined  in  that  example. 

Lemma  2;   Every  cyclic-symmetric  experiment  is  equivalent  to  a 

mixture  of  cyclic-symmetric  simple  experiments. 

Proof:   If  E  is  cyclic-symmetric,  it  can  be  given  the  form 

E  =  (A-,  ,Ap, .  . . ) ,  where  each  A,  =  (a.  .)  is  a  square  cyclic- 

symmetric  matrix.   Let  g,  =  ^ a.,  and  let  E.  =  (1/gy,)  A.  , 

for  each  h.   Then  E,  is  a  cyclic-symmetric  simple  experiment, 
and  E  admits  the  decomposition  E  =  >  ®^,    E,  =  (A,,Ap,...). 
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Example;   In  Example  1,  E  is  c.s.  and  has  the  simple  c.s.  components 
E*,  E**. 

3.  The  partial  ordering  of  simple  cyclic-symmetric  experiments. 

For  any  fixed  number  k  of  hypotheses,  we  consider  in  this 
Section  simple  c.s.  experiments  and  their  partial  ordering 
defined  as  above.   For  such  experiments,  E  contains  E*  if  and 
only  if  E  =  EQ  for  some  square  stochastic  matrix  Q;  our 
primary  purpose  is  to  interpret  this  partial  ordering  more 
explicitly  in  terms  of  the  forms  of  the  c.s.  matrices  representing 
such  experiments.   It  will  suffice  for  our  purpose  to  illustrate 
the  general  case  by  a  detailed  discussion  of  the  case  k  =  3. 

'ill 


Form  1;   Let  E^  =  c 


L  1  L 
L  L  1 


,  where  c  =  1/(1+2L)  and  L  is 


any  number  satisfying  0  ^  L  ;f  1.  The  product  of  two  such 
matrices,  with  respective  parameters  L  and  L' ,  is  easily  found 
to  be  Et„  >  which  is  of  the  same  form  with  L"  =  (L+L'+LL')/(1+2LL' ) . 
We  have  L"  =  1  only  if  L  or  L'  =  1.   We  have  L"  >  max  (L,L'),with 
strict  equality  only  if  L  or  L'  =0.   Thus  the  class  of  experi- 
ments of  form  1  is  simply  ordered,  v/ith  smaller  values  of  L 

representing  more  informative  experiments. 

L  1  1 


Form  2:   Let  E;  =  c* 


,  where  c*  =  1/(2  +  L)  and  L 


1  L  1 
1   1  L 

is  any  number  satisfying  0  _^  L  j^  1.   The  product  of  such  a 

matrix  with  one  of  form  1  above  is  easily  found  to  be  an 
experiment  E*  ,  of  form  2,  with  L"  =  (L  +  2L')/(1  +  L'  +  LL' )  >  L. 
Thus  the  class  of  experiments  of  form  2  is  simply  ordered, 


with  smaller  values  of  L  representing  more  informative  experiments 


r.O- 
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Form  3«   Let  E  =  c 


ILL' 
L'  1  L 
L  L'  1 


,    where  c  =  1/(1+L+L'), 


with  0  j^  L'  ^  L  j^  1.   This  includes  the  preceding  forms  as 
special  cases  and  in  fact  includes  all  c.s.  simple  experiments 
with  k  =  5.   We  proceed  to  consider  the  partial  ordering  of 
such  experiments,  writing  E  =  (p.  .). 

For  each  such  experiment  E,  let  a  =  p,^  +  Pi^'   -^^  ^^ 
easily  verified  that  the  parameter  a  of  E  is  the  common  coordinate 
of  one  of  the  points  of  the  (a, ,ap,a^) -surface  which  represents 
E  in  the  canonical  form  described  in  Section  1  above.   In 
particular,  a  is  the  probability  (;xnder  each  hypothesis)  of  an 
error,  when  the  rule  of  maximum  likelihood  is  used  to  choose  one 
hypothesis  on  the  basis  of  one  observation  from  E  (with  equi- 
probable  randomization  in  cases  of  non-unique  maxima  of  the 
likelihood  function) .  This  inference  rule  is  an  admissible 
one,  since  it  is  readily  derived  as  a  Bayes  solution  of  the 
problem  described,  with  respect  to  the  uniform  prior  distribution 
g  =  (1/3,1/5,1/5). 

Again,  for  each  such  experiment  E,  let  p  =  Pt^j*   For  the 
problem  of  giving  a  confidence  set  estimator  which  excludes  at 
least  one  of  the  hypotheses  and  has  maximujn  probabilities  of 
including  the  true  hypothesis,  a  Bayes  solution  with  respect 
to  the  uniform  prior  distribution  gives  the  maximum  likelihood 
rule  which  excludes  just  the  least  likely  hypothesis  (with 
exclusion  of  one  chosen  by  equiprobable  randomization  in  cases 
of  non-unique  minima).   The  error-probability  of  this  rule,  under 
each  hypothesis,  is  the  parameter  6  of  E. 
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A  simple  necessary  condition  for  E  to  contain  E  is  that 
a  ;f  a*  and  p>  <  p*;  for  if  E  failed  to  achieve  error-probabili- 
ties at  least  as  small  as  E*  in  the  two  specific  problems  just 
described,  it  would  fail  to  contain  E*.   This  condition  may 
be  described  thus:   if  E  contains  E  ,  then  the  distributions 
in  E  are  at  least  as  highly  concentrated  as  those  in  E*,  in  the 
sense  that  under  each  hypothesis  the  most  probable  outcome 
has  probability  at  least  as  high,  and  the  least  probable  outcome 
has  probability  at  least  as  small,  in  E  as  in  E  . 

To  illustrate  that  the  preceding  condition  is  not  sufficient 
for  comparability  of  experiments,  consider 
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The  above  condition  is  satisfied,  since  a-,  =  Op  =  p  and 
P-  =  -^  <  Pp  =  i  .  Consider  a  third  specific  inference  problem, 
that  of  giving  a  point-estimate  of  the  parameter  i  =  1,  2,  or  3, 
with  estimators  to  be  appraised  in  terms  of  their  probabilities 
of  each  of  the  possible  kinds  of  errors.   (The  parameter  a  is 
simply  a  total  probability  of  incori'ect  estimates.)   Any 
non-randomized  estimator  may  be  represented  by  a  function  d  =  d(J) 
of  the  outcome  j  which  takes  values  in  the.  range  of  i.   Any 
randomized  estimator  may  be  represented  as  a  "mixture"  of  such 
non-randomized  estimators;  for  example,  if  d(j)  and  d'(j)  are 
non -rand oral zed  estimators,  then  d"(j)  =  c  d(j)  ^   (1-c)  d(j) 
represents  the  randomized  estimator  which,  when  j  is  observed, 
takes  the  value  d(j)  with  probability  c  and  takes  the  value  d'(J) 
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with  probability  (1-c).  For  any  estimator  d  =  d(j),  possibly 
randomized,  let  a^^^  =  ^iu^^^  "  a^^[d(.)]  =  ?rob[d(X)=u|H^]  if 
n  ^  1,      a^^  =  0  if  u  =  i,  for  u,i  =  1,2,3. 

When  Ep  is  used,  the  (admissible  maximum  likelihood)  estimator 
d(j)  =  j  has  all  error-probabilities  a.^  =  -j|>  u  ^^  i.   When  E 
is  used,  it  can  be  verified  that  every  estimator  (including 
randomized  estimators)  has  at  least  one  error-probability 
exceeding  -^.      Thus  E^,  does  not  contain  E^. 

For  experiments  of  the  general  form  3^  we  offer  here  no 
conveniently-applicable  necessary  and  sufficient  conditions 
for  comparability  of  experiments  in  terms  of  their  parameters 
L  aiid  L';  nor  will  this  be  necessary  for  our  purposes. 

For  k  >  3,  similar  considerations  are  applicable.   For 
example,  among  c.s.  experiments  of  the  form  p,^  >   p^^g  =  p^^  "'•'"Pik 
it  is  easily  verified  as  above  that  there  is  a  simple  ordering 
by  the  parameter  P-ii  »  the  latter  parameter  is  the  probability, 
under  each  hypotheses,  that  the  most  likely  hypothesis  will  be 
the  true  hypothesis. 

4.   Inference  methods  with  intrinsic  justifications.   In  the 
preceding  Section,  for  various  c.s.  simple  experiments  there  were 
described  a  number  of  methods  of  statistical  inference  or 
decision-making,  including  point-and  confidence-set  estimators; 
in  addition,  a  number  of  methods  of  testing  hypotheses  were 
represented  implicitly  by  the  confidence-set  methods  described, 
in  virtue  of  well-known  simple  relations  between  the  two  kinds 
of  methods.   For  each  of  these  methods,  a  more  or  less  complete 
description  was  given  of  the  probabilities  of  the  various 
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possible  appropriate  and  inappropriate  inferences  or  decisions 
under  respective  hypotheses.   The  complete  description  of  such 
relevant  probabilistic  properties  of  a  given  inference  method 
can  in  principle  always  be  determined;  and  for  a  given  purpose 
of  application,  various  possible  inference  methods  can  in 
principle  be  evaluated  and  compared  on  tte  basis  of  such  probabi- 
listic properties.   Such  considerations  are  an  extension  of 
those  discussed  in  detail  for  the  case  k  =  2  in  Sections  7-9  of 
the  preceding  Part  I,  B:  "Inference  methods  with  probabilistic 
justifications."  Each  such  probability  is  defined  directly  in 
the  experiment  under  consideration;  and  each  such  error-  proba- 
bility can  be  interpreted  in  terms  of  relative  frequencies 
of  errors,  under  respective  hypotheses,  in  conceptually- 
possible  indefinite  repetitlonsof  the  given  experiment.  We  turn 
now  to  an  extension  of  the  preceding  Part  I,  C: 
"  Inference  methods  with  intrinsic  justifications".   Since  our 
discussion  here  takes  a  somewhat  different  form,  it  will  com- 
plement the  earlier  discussion  of  the  case  k  =  2;  for  many  details 
of  interpretation,  reference  to  the  earlier  discussion  may 
useful  even  in  connection  with  cases  k  >  2. 

Lemmas  1  and  2  of  S3Ction  2  above  pay  a  basic  role  in 
support  of  the  following  interpretations.  According  to 
Lemma  2,  any  c.s.  experiment  E  may  be  regarded  as  a  mixture  of 
c.s.  simple  experiments  E^.   It  follows  that  any  outcome  of  E  may 
be  regarded  as:   (a)  the  selection  of  a  component  E^  of  E, 
determined  randomly  according  to  probabilities  g^  which  are 
fixed,  independently  of  the  hypotheses;  followed  by:   (b)  the 
observation  of  a  single  outcome  of  the  selected  experiment  E^. 
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We  observe:  (1)  The  likelihood  function  on  any  observed  outcome 
of  E  (that  is,  the  column  of  the  stochastic  matriz  E  corres- 
ponding to  any  observed  outcome  of  E)  is  necessarily  the  same 
as  (proportional  to)  the  likelihood  function  when  that  outcome 
is  regarded  as  an  outcome  of  a  selected  component  E,  . 
(2)  The  likelihood  function  on  any  observed  outcome  of  E  determ- 
ines, essentially  \iniquely,  the  form  of  the  simple  c.s.  component 

E^  of  E  from  which  the  observed  outcome  could  have  arisen, 
h 

(A  single  coliomn  of  a  simple  c.s.  experiment,  specified  up  to 
a  constant  of  proportionality,  determines  the  form  of  that 
experiment  essentially  uniquely.)   (3)  Since  the  selection  of 
a  particular  component  E,  of  E  provides  no  information  relevant 
to  the  hypotheses  (although  it  determines  the  strength  and  nature 
of  relevant  evidence  which  can  be  provided  by  an  outcome  of 
E,  ),  it  follows  that  for  purposes  of  informative  inference, 
any  outcome  of  any  c.s.  experiment  can  and  should  be  inter- 
preted in  the  same  way  as  if  it  were  an  outcome  of  the  essentially 
lAnique  simple  c.s.  experiment  determined  by  the  observed  like- 
lihood function.  The  variety  of  possible  and  possibly-useful 
Interpretations  of  outcomes  of  simple  c.s.  experiments  was 
illustrated  in  part  in  the  preceding  Section  3;  such  inter- 
pretations were  expressed  there  in  terms  of  error-probabilities, 
admitting  frequency  interpretations,  defined  in  the  simple 
c.s.  experiment  under  consideration. 

To  establish  a  similar  conclusion  for  experiments  which 
are  not  necessarily  c.s.,  we  use  Lemma  1,  and  the  notation  of 
its  proof,  in  Section  2  above.   The  evidential  interpretation 
of  any  outcome  X  =  j  of  any  experiment  E,  should  clearly 
coincide  with  the  evidential  interpretation  of  the  following 
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outcome:   the  random  selection  of  E,  as  a  component  of  any 
mixture  experiment  E  (of  which  E,  is  in  fact  a  component), 
followed  by  use  of  E,  and  observation  of  its  outcome  X  =  j. 
The  introduction  into  our  discussion  of  any  such  experiment 
E  containing  E,  as  a  component  is  an  arbitrary  step;  however,  ■ 
the  preceding  comment  shows  that  this  step  does  not  affect  the 
recognizable  evidential  status  of  any  outcome  j  of  E,;  and 
the  following  comments  show  that  this  step  is  useful  in 
throwing  additional  light  on  the  evidential  character  of  such 
an  outcome.   We  take  E  to  have  the  form  defined  in  the  proof 
of  Lemma  1:   E  =  J   ($  -?-  E,  ,  where  the  components  E,  are 
defined  as  before.   We  are  considering  the  evidential  character 
of  outcome  j  of  E, ,  and  we  have  agreed  that  this  is  the 
same  as  the  evidential  character  of  the  outcome  "E,  and  its 
j   outcome"  of  the  mixture  experiment  E.   But  E  is  c.s., 
and  therefore  the  conclusion  established  aboye  is  applicable 
to  its  outcome  "E,  and  its  j   outcome".   Thus  we  conclude: 
For  purposes  of  informative  inference,  any  outcome  of  any  ex- 
periment can  and  should  be  interpreted  in  the  same  way  as 
an  outcome  of  a  simple  c.s.  experiment  having  the  same  likeli- 
hood function;  the  structure  of  the  original  experiment  is 
irrelevant,  apart  from  determination  of  the  likelihood  function 
on  the  observed  outcome. 

If  k  =  2,  a  simple  c.s.  experiment  is  a  sj^mmetric  simple 

,  which  may  be  characterized 
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as  discussed  In  Part  I  above.   For  k  =  3,  a  simple  c.s. 
experiment  E  =  (Pij)>  with  p^^  >   p^^   =  P15'  "^^^  ^®  considered 
characterized  by  the  values  of  its  "error-probabilities" 
p-,n>   Pt:5*»  ^o^  this  case,  evidential  interpretations  of  various 
specific  forms,  and  their  qualitative  and  quantitative  properties 
In   relation  to  error-probabilities,  were  described  in  detail 
in  Section  2  above.  Similar  considerations  hold  for  k  >  3- 

5.   Intrinsic  confidence  methods.   One  useful  method  of 
expressing  part  of  the  evidential  meaning  of  an  outcome  of  a 
simple  c.s.  experimenc  is  by  use  of  inference  statements  of 
the  confidence  set  form.   For  any  such  experiment  E  =  (p. .), 
with  p-,,  >   P]  2  ^  ••'  ^  Plk'  ^^^   maximum  likelihood  estimator 
of  the  unknown  hypothesis  H.  is  formally  a  confidence  set 
estimator  with  confidence  coefficient  P-1-1  •  The  two  most 
likely  hypotheses,  on  any  observed  outcome,  constitute  a  con- 
fidence set  with  coefficient  (p,,  +  Pt o) i  and  so  on.   The 
set  which,  on  any  outcome,  includes  all  but  the  least  likely 
hypothesis  is  a  confidence  set  with  coefficient  (1  -  Pi  1,)  • 
If,  for  example,  p,,  =  p^o  >   p,^  >    ...   >  P-,k'   ^^^^^  ^^-^   such 
confidence  sets  except  the  first  can  be  defined  in  the  same 
way;  construction  of  a  maximum-likelihood  confidence  set 
consisting  of  a  single  hypothesis  could  also  be  given  formally 
in  this  case,  but  would  be  of  little  interest  for  typical 
purposes  of  informative  inference. 

Such  maximum  likelihood  confidence  sets  are  optimum  in  the 
sense  (a)  that  each  such  set-estimator  has,  under  eachhypothesis, 
the  largest  possible  probability  of  including  the  true 
hypothesis,  among  all  (possibly  randomized)  set-estimators 
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16 
whose  confidence-sets  are  restricted  to  contain  the  same 
number  or  fewer  points  (hypotheses);  and  in  the  sense  (b) 
that  each  such  set-estimator  has  probabilities  of  including 
various  false  hypotheses,  when  respective  hypotheses  are  true, 
which  cannot  be  strictly  improved,  except  by  reduction  of 
the  confidence  coefficient. 

Such  confidence  sets  were  illustrated  for  the  case  k  =  5 
in  Section  2  above.   The  set  of  confidence  coefficients  of  such 
estimators  characterizes  thi  structure  of  a  simple  c.s.  experiment; 
for  example,  for  k  =  3,  the  respective  confidence  coefficients 
are  p.,,  and  p,,  +  Pno'  ^^om   these  values,  we  can  immediately 
calculate  p,p  and  P->^}   and  thus  determine  the  form  of  E  =  (p.  .). 

If  the  experiment  E  whose  outcome  is  to  be  interpreted 
happens  to  be  of  the  simple  c.s.  form,  then  inference  methods 
of  the  preceding  kinds  are  confidence  methods   (confidence 
set  estimation  methods)  of  the  kind  introduced  by  Neyman: 
the  confidence  coefficients  and  error-probabilities  referred 
to  are  then  defined  directly  in  terms  of  the  structure  of  E=(p.  .) 
as  Just  described.   These  confidence  coefficients  and  error- 
probabilities  admit  the  usual  frequency  interpretations,  in 
terms  of  conceptually  possible  repetitions  of  the  given 
experiment  E. 

If  the  experiment  E  happens  to  be  c.s.  but  not  simple,  then 
it  is  (by  Lemma  2  of  Section  2  above)  equivalent  to  a  mixture 
of  simple  c.s.  experiments.   In  this  case  the  conclusion 
of  the  preceding  Section  can  be  given  the  interpretation: 
Any  outcome  of  E  should  be  interpreted  as  an  outcome  of  the 
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corresponding  simple  c.s.  component  of  E;  in  other  words,  any 
outcome  of  E  should  be  interpreted  "conditionally"  with  the 
selected  simple  c.s.  component  of  E  as  the  experimental  frame 
of  reference.   In  such  a  case,  when  confidence  methods  like 
those  described  above  are  used,  based  upon  the  simple  c.s. 
experiment  determined  by  the  likelihood  function  on  an  observed 
outcome  of  E,   these  methods  are  formally  an  example  of 
conditional  confidence  methods.   Conditional  applications  of  in- 
ference methods  of  standard  kinds  are  ^videly  used,  and  are 
generally  considered  appropriate  for  purposes  of  informative 
inference,  when  an  appropriate  conditional  experimental  frame 
of  reference  is  recognized.   Decomposition  theorems  such  as 
Lemma  2  and  its  analogues  may  be  considered  mathematical  analyses 
of  the  structures  of  statistical  experiments  which  extend 
considerably  the  range  of  recognizably  appropriate  conditional 
frames  of  reference  for  purposes  of  informative  Inference.   The 
confidence  coefficients  and  error-probabilities  of  such  conditional 
confidence  methods  admit  the  usual  frequency  Interpretations,  as 
conditional  probabilities.  In  terms  of  conceptually  possible 
repetitions  of  the  given  experiment  E,  conditional  on  the 
selection  of  the  particular  simple  c.s.  component  of  E  which 
corresponds  to  the  observed  outcome. 

If  the  experiment  E  is  not  c.s.,  the  conclusion  of  the 
preceding  Section  nevertheless  supports  interpretations  of  an 
outcome  of  E  as  If  It  were  an  outcome  of  the  simple  c.s.  experiment 
E'  determined  by  the  likelihood  function  on  the  observed  outcome. 
In  general,  E'  is  not  a  component  of  E;  and  if  such  interpretations 
are  expressed,  for  example,  by  maxlmiom  likelihood  confidence 
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methods  based  on  E',  then  the  confidence  coefficients  and  error- 
probabllltles  of  such  methods,  which  are  defined  in  E',  will 
not  in  general  be  interpretable  as  probabilities  or  conditional 
probabilities  defined  in  E.   For  this  reason,  we  designate  such 
methods  in  general  as  intrinsic  confidence  methods.   Intrinsic 
confidence  methods  constitute  an  extension  and  generalization 
of  confidence  methods  and  conditional  confidence  methods, 
appropriate  for  purposes  of  informative  inference.  (An  intrinsic 
confidence  method  can  always  be  regarded  as  a  conditional 
confidence  method  in  the  hypothetical  foi^mal  sense  that,  in 
some  hypothetical  c.s.  experiment  which  contains  the  given 
experiment  E  as  a  component  (Lemma  1),  the  intrinsic  confidence 
method  is  also  a  conditional  confidence  method.   This  comment 
should  not  be  confused  with  the  development  of  the  principal  con- 
clusion of  Section  4  above.) 

It  should  be  noted  that  for  a  given  k,  different  outcomes 
may  give  the  same  intrinsic  confidence  set  with  the  same 
Intrinsic  confidence  coefficient,  although  these  outcomes  have 
different  likelihood  functions  which  do  not  coincide  completely. 
In  such  a  case,  other  intrinsic  confidence  sets  based  on  the 
respective  outcomes  will  fail  to  coincide,  ref lectins  differences 
m  likelihood  functions.  .   This  illustrates  that  in  general 
any  single  intrinsic  confidence  statement  expresses  only  part 
of  the  evidential  significance  of  an  outcome,  and  is  only  an 
incomplete  summary  of  the  likelihood  function. 
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6.   An  Interpretation  of  the  "principle  of  Insufficient 

reason"  . 
In  the  method  of  treating  statistical  inference  problems 
which  was  initiated  by  Bayes  and  Laplace,  "  uniform  prior 
probabilities "were  postulated  for  the  respective  statistical 
hypotheses  under  consideration,  and  the  formal  "posterior 
probabilities",  calculated  by  Bayes'  formula,  were  interpreted 
as  giving  inferences  from  observational  data  to  the  hypotheses 
in  the  absence  of,  or  independent  of,  background  knowledge  or 
prior  opinions  concerning  the  hypotheses.   Evidently  the 
intention  of  those  who  initiated  and  have  used  this  method  has 
been  to  treat,  in  suitably  objective  and  meaningful  terms,  the 
problem  of  informative  Inference,  that  is,  the  problem  of 
evidential  interpretation  of  experimental  outcomes,  as  it 
occurs  in  empirical  research  situations.  Following  Laplace, 
the  method  was  widely  accepted  during  the  nineteenth  century. 
Analysis  and  criticism  of  the  possible  ambiguity  of  the  notion 
of  "uniformity"  of  prior  probabilities,  and  of  the  unclear  nature 
of  such  "prior  probabilities"  in  general,  has  led  to  a  general 
rejection  of  this  method  throughout  the  present  century. 
(The  use  of  prior  probabilities,  not  in  general  "  uniform", 
to  express  background  knowledge  and/or  prior  opinion,  continues 
to  be  recommended  by  a  distinguished  minority  of  modern  statistic- 
ians.  However, such  recommendations  are  not  addressed  directly 
to  the  problem  of  informative  inference  as  described  above; 
but  to  problems  of  using  experimental  outcomes,  along  with 
background  knowledge,  prior  opinion,  and  information  about  specific 
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features  of  an  inference  situation  such  as  goals,  practical 
consequences,  etc.,  in  order  to  reach  appropriate  deci^ns 
or  conclusions.) 

It  is  at  least  a  striking  coincidence  that  inference 
methods  based  upon  the  "principle  of  insufficient  reason", 
in  problems  having  suitable  symmetry  (or  analogous)  proper- 
ties, coincide  in  form  {although  they  differ  in  interpretation) 
with  modern  inference  methods  derived  without  use  of  prior 
probability  notions.   For  example,  if  an  experiment  E  happens 
to  be  simple  c.s.,  then  formal  assignment  of  prior  probabilities, 
each  equal  to  1/k,  to  the  hypotheses  H. ,  leads  to  "  posterior 
most  probable"  sets  of  hypotheses  v;hich  coincide  with  the 
(optimism)  maximum  likelihood  confidence  sets  found  above; 
and  each  such  set  has  a  posterior  probability  which  is 
numerically  equal  to  the  corresponding  confidence  coefficient. 

Now  the  analysis  of  preceding  sections  shows  that  for 
purposes  of  informative  inference,  whatever  the  structure  of  E, 
its  outcomes  can  and  should  be  interpreted  as  outcomes  of 
corresponding  simple  c.s.  experiments.   Vi/hen  such  an  appropriate 
experimental  frame  of  reference  for  interpreting  an  experimental 
outcome  is  adopted,  as  it  can  and  should  be  to  serve  the  apparent 
intention  of  those  who  initiated  use  of  the  "principle  of 
insufficient  reason",  then  the  formal  application  of  the  latter 
principle  can  be  regarded  as  a  formal  algorithm  for  calculating 
the  intrinsic  confidence  sets  and  coefficients  which  themselves 
have  the  independent  justifications  given  above j  and  the  term 
"  posterior  probability'-  (determined  with  use  of  uniform  prior 
probabilities)"  may  be  regarded  as  a  traditional  terminology,  in 
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place  of  which  we  use  the  term  "  Intrinsic  confidence  coefficient", 
vjith  the  meaning  established  in  the  preceding  Section. 

Thus  the  "principle  of  insufficient  reason",  in  such 
problems  and  uses,  must  be  regarded  as  one  of  those  "principles" 
which,  in  various  mathematical  disciplines,  have  been  recognized 
and  used  to  obtain  "correct"  results,  in  advance  of  perfectly 
clear  formulations  of  the  problems  considered  and  of  the  precise 
nature  of  "correct  solutions"  to  such  problems.   (In  experiments 
for  an  infinite  number  of  hypotheses,  the  "principle  of 
insufficient  reason"  has  been  interpreted  and  used,  despite  the 
technical  difficulty  of  specifying  the  mathematical  meaning  of 
"ujiiform  prior  probabilities",  in  such  a  way  that  the  likelihood 
function  is  taken  to  be  the  elementary  "posterior  probability 
function"  with  respect  to  some  "natural",  "uniform"  measure  on 
the  parameter  space;  while  it  is  not  necessary  for  mathematical 
reasons  that  the  latter  be  probabilitj/  measures,  there  remains 
the  question  of  interpretation  and  possible  ambiguity  of 
"natural"  or  "uniform".   We  defer  discussion  of  experiments  for 
an  infinite  number  of  hypotheses.) 

In  retrospect,  the  early  broad  usage  of  the  term  "probability" 
is  seen  to  have  embraced  at  least  the  following  kinds  of  meanings 
which  now  seem  clear  and  distinct  (although  we  have  frequent 
occasion  to  use  several  of  them  in  discussion  of  a  single  problem 
of  a  .single  problem  of  inference): 

(a)  Probability  as  used  above  to  specify  mathematical  models 
of  statistical  hypotheses;  admitting  conceptual  frequency  inter- 
pretations. 
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(b)  Prior  probability  (not  in  general  uniform;  and  related 
posterior  probability),  as  sometimes  used  to  express  prior  opinion 
and  background  knowledge,  brought  to  an  inference  problem.   This 
aspect  of  inference  situations  has  not  been  represented  in  our 
discussion,  because  it  is  not  a  part  of  the  problem  of  informative 
inference  as  such. 

(c)  Posterior  probability  (calculated  from  formal  uniform  prior 
probabilities;  "principle  of  insufficient  reason  ").   In  place 
of  this  traditional  usage,  we  have  the  preferable  term  "intrinsic 
confidence  (coefficient)"  which  is  defined  as  above  in  terms  of 
the  more  basic  likelihood  function  and  the  interpretations  established 
for  the  latter.   In  brief,  "intrinsic  confidence",  and  the  more 
basic  "likelihood" with  its  interpretations,  explicate  and  replace 
this  traditional  usage  of  "posterior  probability".   Similarly, 
"uniform  prior  probability"  could  well  be  replaced  by  "uniform 
prior  likelihood",  the  latter  denoting  the  constant  likelihood 
fimction  which  properly  represents  absence  of  informative  observations 
(or  presence  of  hypothetical  uninformative  outcomes)  at  the  outset 
of  an  experiment.   Then  the  traditional  usage  would  be  represented 
intact,  with  the  term  "probability"  replaced  by  "likelihood" 
throughout,  except  for  usage  (a),  and  vrith  all  attention  directed 
as  above  to  the  usual  likelihood  function. 

7.   An  interpretation  of  Fisher's  "fiducial  argument". 
For  any  experiment  E  =  (p.  .),  we  can  assume  without  loss 
of  generality  that  the  range  of  J  is  doubly-infinite,  -  00  <  j  <  00  , 


so  that  each  row  of  (p.  .)  is  doubly-infinite,  with  ""';;;::::::^     p.  .  =  1 

■^"^  -oo<j<oo'  '^ 

for  each  i.   The  range  of  i  may  be,  until  otherwise  specified. 
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either  finite  (1  <  i  f  k),  or  countably-inflnlte  (1  <  i  <  oo), 
or  doubly-infinite  (-co"«  ±   <  co  )  .      We  note  that  j  is  a  sufficient 
statistic  (not  in  general  minimal  sufficient),  as  is  any  one-to- 
one  function  of  j.   Any  real- valued  function  t  =  t(j,i)  of  i  and 
j  is  called  a  quasistatistic;  a  quaslstatistic  becomes  a  statistic 
when  it.'j  argument  1  is  given  any  fixed  value.   A  sufficient  quasi- 
statistic  is  one  which  becomes  a  sufficient  statistic  when   1  is 
fixed,  in  turn,  at  each  of  its  possible  values.  (A  minimal 
sufficient  quasistatistic  is  defined  analogously.)   A  stationary 
quasistatistic  is  one  which  determines  statistics  each  having 
the  same  distribution  in  the  sense  that,  letting 
H(t,i)  =  Prob  (t(X,l)  <  t|H.),  we  have  H(t,l)  =  H(t,l),  for  each 
t  and  1.   A  pivotal  quasistatistic  is  one  which  is  both  stationary 
and  sufficient. 

If  E  is  such  that  p,  ■  =  P-    -,     .  -,  ,  1  is  called  a  translation 

i-j    1-1, j-i 

parameter,  since  for  each  1  the  distribution  of  X  under  H, 
coincides  with  the  distribution  of  (X+i-1)  under  H, .   We  observe 
that  in  any  such  experiment,  the  quasistatistic  t(j,l)  =  j-1+1 
is  pivotal. 

In  the  case  of  a  simple  c.s.  experiment  for  a  finite  number 
k  of  simple  hypotheses,  represented  by  a  square  matrix  E  =  (p  .), 
the  quasistatistic 

{j-1+1,   if  the  latter  Is  positive, 
j-l+l+k,  otherwise, 

is  pivotal.   If  the  k  outcomes  j  of  such  an  experiment  are 
regarded  as  cyclically  ordered,  as  k  uniformly- spaced  points  on 
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the  circumference  of  a  circle,  then  i  may  be  regarded  as  a  translation 
parameter  in  an  extended  use  of  that  term.   We  shall  call  the 
parameter  i  of  any  simple  c.s.  experiment  a  rotation  parameter, 
since  for  each  i  the  distribution  of  X  (mod.  k)  lander  H.  coincides 
with  the  distribution  of  (X+i-1)  (mod.  k)  under  H-,  . 

If  E  is  simple  c.s.,  then  each  of  its  columns  (possible 
likelihood  functions)  is,  in  the  formal  mathematical  sense,  a 
probability  distribution  over  the  possible  values  i  of  the  unknown 
parameter.  The  same  is  true  of  any  column  of  any  experiment  in 
which  i  is  a  translation  parameter  having  a  doubly-infinite  range. 
If  E  is  an  experiment  of  one  of  these  two  forms,  then  an  example 
or  analogue  of  Fisher's  "fiducial  argument"  gives  the  following 
definition:   When  an  outcome  X  =  j  has  been  observed,  take  the 
j   column  of  E  as  the  "fiducial  probability  distribution"  of  the 
parameter  i.   (We  note  that,  in  the  case  of  such  experiments,  these 
distributions  coincide  formally  ivith  those  obtained  by  formal 
application  of  the  "principle  of  insufficient  reason"  discussed 
in  Section  6  above.) 

The  "fiducial  argument"  by  which  a  "fiducial  distribution" 
has  usually  been  defined  may  be  illustrated  in  the  present  case,  of 
translation  and  rotation  parameters  i,  with  use  of  the  pivotal 
quasistatistics  (usually  called,  in  this  context,  "pivotal 
quantities")  t(j,i)  defined  above,  as  follows:   For  each  i  and 
integer  t,  we  have  Prob  (t(X,i)  <  t|H.)  =  H(t,i)  =  H(t,l),  which 
is  independent  of  i.  H(t,l)  is  formally  a  cumulative  probability 
distribution  with  argument  t;  if  X  =  j  is  an  observed  outcome  of 
E,  then  the  function  G(i,j)  of  i  defined  by  G(i,j)  =  1  -  H(t(j,i)-,  1) 
is  formally  a  cumulative  probability  distribution  function,  termed 
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the  "fiducial  distribution  of  the  parameter  i,  when  X  =  j  has 
been  observed".   In  the  present  cases  we  have  the  corresponding 
discrete  elementary  "fiducial  probability  function" 
g(i>j)  =  G(i,j)  -  G(i-l,j)  =  P- -|,  as  stated  above,  which  coincides 
with  the  likelihood  function.   In  such  cases,  probability 
statements  about  1  based  formally  on  the  fiducial  probability 
function  g(i,j)  must  parallel  in  form  both  confidence  statements 
and  statements  based  formally  on  the  "principle  of  insufficient 
reason" . 

Fiducial  methods  were  developed  by  Fisher  evidently  for  the 
purpose  of  treating  what  we  have  called  the  problem  of  informative 
inference.   For  experiments  for  a  finite  number  of  hypotheses, 
our  conclusion  in  Section  5  above  shows  that  an  appropriate  frame 
of  reference  for  informative  Inferences  Is  always  provided  by 
a  simple  c.s.  experiment  determined  by  the  likelihood  function 
on  the  observed  outcome.   Evidently  the  adoption  of  such  a  frame 
of  reference  would  serve  the  general  intention  for  which  fiducial 
methods  have  been  developed.   Adoption  of  such  a  frame  of  reference 
would  also  extend  considerably  the  scope  of  formal  applicability 
of  the  "fiducial  argument",  since  the  conditions  of  its  applicability 
are  evidently  not  met  in  many  experiments  (for  k  hypotheses),  but 
in  any  simple  c.s.  experiment  fiducial  probabilities  can  be  defined 
formally,  as  above;  such  adoption  would  lead  to  fiducial  probability 
statements  about  1  which  always  parallel  in  form  Intrinsic  confidence 
statements  defined  in  Section  5  above. 
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The  preceding  discussion  of  fiducial  methods  has  been 
restricted  to  formal  definitions,  and  to  an  opinion  concerning 
the  general  purpose  to  which  the  methods  are  addressed.   It  has 
been  seen  that  "fiducial  probabilities"  defined  as  above  are, 
like  "posterior  probabilities"  (determined  by  use  of  the 
"principle  of  insufficient  reason"),  cases  of  mathematical 
probability  distributions,  defined  on  the  range  of  an  unknown 
parameter  i.   The  only  substantive  interpretation  which  the 
present  writer  can  suggest  for  the  term  "fiducial  probability" 
is  that  the  term  seems  to  be  an  instance  of  the  tradition  of 
broad  usage  of  "probability",  initiated  by  Bayes  and  Laplace  in 
the  different  form  discussed  in  the  preceding  Section  6,  and  used 
to  express  statements  of  informative  inference  about  unknown 
parameters.   It  seems  to  the  present  writer  that  the  problem 
of  informative  inference  itself,  for  v;hich  evidently  fiducial 
methods  have  been  developed,  is  clarified  by  the  analysis  of 
Section  ^  above,  and  served  well  and  clearly  by  the  intrinsic 
confidence  methods  defined  and  interpreted  as  in  Section  5  above. 

The  scope  of  formal  correspondence  between  intrinsic  confidence 
methods  and  fiducial  methods  will  be  discussed  for  a  wider  class 
of  problems  in  a  following  part  of  this  paper.   In  the  light 
of  the  preceding  discussion,  it  will  not  be  altogether  unexpected 
if  intrinsic  confidence  limits  for  the  difference  of  means  in  the 
Behrens-Pisher  problem  exist  and  coincide  in  form  with  the 
fiducial  limits  given  by  Fisher  for  that  problem. 
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i     8.   The  relativity  of  Intrinsic  evidential  Interpretations 

■        expressed  in  terms  of  error-probabilities. 

^^   Our  use  of  simple  eye lie -symmetric  experiments  as  a  frame 
of  reference  has  played  a  technical  role  in  establishing  the 
principal  conclusion  of  Section  4  above,  concerning  the  basic 
status  and  role  of  the  likelihood  function  for  purposes  of 
informative  inference.   In  addition  we  have  found,  In  Section  5* 
that  simple  c.s.  experiments  provide  a  convenient  useful  frame 
of  reference  for  techniques,  such  as  intrinsic  confidence  methods, 
which  express  some  of  the  evidential  meaning  of  likelihood  functions, 
The  intrinsic  confidence  coefficients  associated  with  such  intrinsic 
confidence  statements,  and  analogous  error-probabilities  which  may 
be  associated  with  other  such  intrinsic  evidential  Interpretations 
of  experimental  outcomes,  ar-?  defined  and  meaningful  only  in 
association  with  the  simple  c.s.  experiment  (determined  by  the 
likelihood  function)  which  may  conveniently  be  adopted  as  an 
appropriate  frame  of  reference  for  evidential  interpretations. 

Apart  from  convenience  and  simplicity,  however,  there  is  no 
reason  of  principle  which  recommends  svich  frames  of  reference 
as  uniquely  appropriate  for  interpreting  and  expressing  the 
evidential  meaning  of  an  observed  likelihood  function.   The 
latter  is  basic  and  is  itself  evidentially  meaningful,  and  its 
evidential  meaning  can  be  recognized  in,  and  expressed  in  terms  of, 
various  alternative  adequate  experimental  frames  of  reference. 

For  example,  if  k  =  2,  an  outcome  j  which  gives  the  likelihood 
function  (p-|  •>  Ppi)  =  (c,99c),  for  any  positive  c,  gives  the 
likelihood  ratio  statistic  the  value  99*   and  can  be  Interpreted 
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by  use  of  the  simple  c.s.  experiment  (simple  symmetric  binary 


experiment)   E  = 


,99   .01 


as  a  conveniently  chosen  frame  of 
..01   ;99 
reference.   In  the  latter  frame  of  reference,  the  outcome  can  be 

characterized  as  supporting  Hp  against  H,  with  an  evidential 

strength  associated  v;ith  error-probabilities  equal  to  .01. 

(Such  an  inference  statement  is  formally  an  example  of  an 

intrinsic  confidence  method:   On  the  basis  of  the  outcome  described, 

regardless  of  the  structure  of  the  experiment  from  which  it  was 

obtained,  the  hypothesis  Hp  (or  the  set  of  parameter  points  i 

consisting  of  the  single  point  i  =  2)  constitutes  an  intrinsic 

confidence  set  (or  in  this  case  an  intrinsic  confidence  point) 

estimate,  having  intrinsic  confidence  coefficient  .99)' 

However, the  same  outcome  can  be  characterized  Just  as 

properly,  although  perhaps  less  conveniently  for  some  purposes. 


as  evidentially  equivalent  to  the  second  outcome  of  the 


asymmetric  simple  binary  experiment  E'  = 
"false  negatives"  are  impossible  but  "fa 
probability  1/92  =  .0101  >  .01. 


n 


98/99  1/99 


in  which 


0      1  _ 
se  positives"  have 


The  structure  of  E'  is  cliaracterized  by  the  two  error- 
probabilities  .0101,  0,  while  the  structure  or  E  is  characterized 
by  the  two  error-probabilities  .01,  .01.  This  example  illustrates 
that  when  part  of  the  evidential  meaning  of  a  likelihood  function 
is  expressed  by  use  of  intrinsically-associated  error-probabilities 
(as  in  intrinsic  confidence  methods),  the  specification  of  the 
chosen  experimental  frame  of  reference  (e.g.  a  simple  c.s. 
experiment)  must  be  included  as  an  essential  part  of  the  Interpretive 
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statements.   No  doubt,  the  choice  of  simple  c»s.  experiments,  and 
their  analogues  in  more  general  problems,  will  usually  be  convenient; 
and  terms  such  as  "intrinsic  confidence  coefficients"  can  be 
defined,  as  above,  to  refer  automatically  to  such  convenient  frames 
of  reference. 

Part  B:  Translation  and  scale  parameters. 

9.   Conditional  inference  methods.   Let  E  denote  any  experiment 

having  the  following  structure:   Y  is  a  random,  variable  (r.v.) 

with  c.d.f,  G(y  -  Q),  where  G  is  known,  and  the  unknown  translation 

parameter  lies  in  any  specified  subset  Pt- of  the  real  line.  (Alter- 

natively,  let  Y   be  a  positive  random  variable  with  c.d.f.  G(y/c), 

with  G'  known  and  the  scale  parameter  c  unknown,  0  <  c  <  go.  Then 

Y  =  log  Y"  has  c.d.f.  G(y  -  6)  with  translation  parameter  Q  ~   log  c, 

r  ^ 
where  G(u)  =  G  (exp(u)).   Let  G(u)  =  \    g(u)  du,  -  co  <  u  <  oo  , 

Let  X  =  (y   , . ,y  )  denote  a  sample  of  n  independent  observations 

on  Y.   Let  w  =  w(x)  =  (y^  -  Y^f^yJ^  '  ^i^f    l®t  z  =  z(x)  =  j    ; 

then  (w,z)  is  a  sufficient  statistic,  having  a  probability  density 

function  h(w,z;©)  =  q(w)  t(z  -  &;w ) ,  where  the  marginal  density 

function  q(,)  of  VJ,  and  the  conditional  density  function  t(.;») 

of  Z  -  P,  given  that  W  =  w,  are  known  and  independent  of  0. 

For  each  fixed  w,  let  E  denote  the  experiment  consisting  of 

a  single  observation  z  on  the  r.v.  Z  with  p.d.f.  t(z  -  ^;w)  defined 

above,  with  unknown  translation  parameter  0.   Let  E  denote  the 

q  . 

mixture  experiment  in  which  an  observation  w  is  taken  on  the  r.v,. 
W  with  p.d.f.  q(w),  defined  above,  and  then  the  experiment  E^  is 
performed.  A  sufficient  statistic  for  E  is  (w,z),  which  has 
p.d.f.  h(w,zj©)  =  q(w)  t(z  -  ©jw),  as  in  E  above.   Thus  E  =  E  j 
that  is,  the  experiment   E  is  equivalent  for  all  inference  purposes 
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to  the  mixture  experiment  E  .   It  follows  that  for  typical  purposes 
of  informative  inferences  about  Q,    an  outcome  x  of  E  can  and  should 
be  Interpreted  as  a  corresponding  outcome  consisting  of  a  single 
observation  z  from  the  experiment  E  ,  which  has  the  known  (condi- 
tional) p.d.f.  t(z  -  Q;w), 

Instead  of  a  fixed  number  n  of  observations  y.  as  above, 
consider  any  sequential  sampling  rule,  defined  with  reference  only 
to  the  observed  sequences  of  differences  Wp  =  (yp-  y^  ),..., 
^m  ~  ^^Z   "   ^l****»'^m  ~  ^-Of****   which  terminates  with  probability 
one.   For  any  sequence  of  observations  x  =  (y-.  ,y2»  •  •  • ) ,  let  n  =  n(x) 
denote  the  number  of  observations  y.  required  for  terraination| 
then  n((y   -  0),  n(y2  -  ©),..»))  is  a  function  independent  of  0, 
which  could  be  written  n(x)  =  n(Wp(x),  w  (x),...).   For  each 
sequence  x,  let  z  =  y,  and  let  w  =  w  ,  v.   Then  as  above  we  have 
that  (w,z)  is  a  sufficient  statistic,  with  the  distribution  of  w 
independent  of  Q,  and  the  conditional  p.d.f,  of  z  having  the  form 
t(z  -  6;w)  with  translation  parameter  Q.   Thus  the  discussion  and 
conclusion  of  the  preceding  paragraph  is  applicable  also  to  such 
sequential  experiments.   One  useful  class  of  sequential  sampling 
rules  have  the  following  form:   Continue  sampling  until  the  form 
of  t(.jw),  which  depends  only  upon  w,  allows  (conditional)  inferences 
about  e  which  are  suitably  highly  inforative;   e.g.,  until  t(,jw) 
represents  a  translation-parameter  family  of  distributions  each  of 
which  is  sufficiently  highly  concentrated  to  provide  (conditional) 
confidence  Intervals  for  0  which  are  suitably  short  and  have  suitably 
high  confidence  coefficients,  as  determined  in  the  following  section, 

10,  Intrinsic  confidence  methods.   If  the  range  of  G  is  the  real 
line,  then  the  conditional  frame  of  reference  E^  described  above  is 
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an  experiment  characterized  completely  by  the  likelihood  function 
t(z  -  Cijw)  of  0  on  the  observed  outcome  (w,z)  of  E;   and  the  possible 
distributions  of  Z  in  e"  are  a  full  translation  group  of  distrib- 
utions (  -  CO  <  0  <  CO  ) .   Any  conditional  confidence  methods  of 
inference,  in  s uch  a  case,  may  also  be  called  intrinsic  confidence 
methods,  in  a  natural  extension  of  the  usage  Introduced  in  Section 
5  above. 

Let  F(u;w)  =  I  t(u;w)  du  for  each  u  and  w, 

4.  CO 

Let   u(a,w)    be   defined   as    the   solution  u   of   the   equation  a   =  P(u,w), 
and    let   Q(z,w,a)    =  z   -  u(a,w),    for   each  pair  a,w   for  which  the 
first   equation  has   a   unique    solution.      Then   0(z,w,a)    is   a   lower 
a-level  confidence   limit   estimator    (and/or  an  upper    (1   -  a)-level 
upper   confidence   limit   estimator)    of  Q    (conditional   on  w), 
G(z,w,,5)    is   a   median-unbiased    point-estimator   of   Q,      The  pair 
e(z,w,,95),    ^(z,w,.05)    is   a    90   o/o  confidence    interval   estimator 
of   0. 

For   each  constant   k  =  0  and   each  w,    let 

A(w,k)  =  \u\    t(u,w)  >  kj  ,  and  let  Y(w,k)  =  J    t(u;w)du  . 

A(w,k) 
For  each  a,w,  and  k,  let  B  =  B(z,w,k)  =  -fol'  (z  -  Q-)    e  A(w,k)  I   • 

Then  B  is  a  y-level  confidence  set  estimator  of  Q-   (conditional  on  w ) , 

Such  estimators  may  be  called  maximum  likelihood  (conditional) 

confidence  setsj  it  is  easily  verified  that  for  each  y*  such  set 

estimators  are  optimum  in  the  sense  that  they  have  (conditionally 

and  unconditionally)  minimum  Lebesgue  measure,  among  all  set 

estimators  whose  confidence  coefficients,  conditional  on  w,  are 

never  smaller  than  y* 
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11»  Discussion.   It  is  interesting  that  such   conditional  con- 
fidence  limits  and  sets  coincide  in  all  formal  details  (though  not 
in  interpretation)  with  those  based  upon  the  traditional  Bayesian 
"principle  of  insufficient  reason"  in  which,  after  initial  reference 
to  a  "uniform  prior  distribution  of  Q  over  -  oo  <  0  <  co,  "  the 
likelihood  function  t(z  -  O^w)  is  treated  formally  as  a  posterior 
p.d.f,  of  O. 

Furthermore,  within  the  conditional  frame  of  reference  of  an 
experiment  E^^  which  seems  appropriate  for  purposes  of  informative 
inference,  z  is  a  sufficient  statistic,  and  the  quasistatistic 
v(z,©)  =  z  -  6  has  the  same  distribution  under  each  9,   Hence  it  is 
evidently  possible  to  apply  formally  the  "fiducial  argument"  used 
by  Fisher  to  define  a  "fiducial  probability  distribution  of  0."   It 
is  interesting  that  the  resulting  "fiducial  p.d.f,  of  Q"  coincides 
with  the  likelihood  function  t (z  -  ©;w),  and  inference  statements 
about  Q   of  the  fiducial  type  coincide  in  all  formal  details  (though 
not  in  interpretation)  with  those  obtained  above  as  conditional 
confidence  statements  about  0, 

The  principal  conclusion  of  the  preceding  sections  may  be  stated 
briefly  as  follows:   For  purposes  of  informative  inference  concerning 
a  translation  parameter  ©,  regardless  of  the  structure  of  an 
experiment  E  (so  long  as  O  is  a  translation  paraineter  with  respect 
to  the  distributions  of  outcomes  of  E),  the  appropriate  frame  of 
reference  is  determined  just  by  the  likelihood  function  on  the 
outcome  observed:   this  frame  of  reference  is  an  experiment 
(generally  different  from  E)  in  which  all  possible  outcomes  give 
likelihood  functions  differing  from  that  observed  only  by  a  trans- 
lation.  We  recall  that,  as  in  Sections  5  and  8  above,  each 
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conditional  (or  intrinsic)  confidence  method  gives  only  a  partial 
summary  and  interpretation  of  the  likelihood  function,  and  that  the 
latter,  with  the  totality  of  its  possible  interpretations,  is  basic 
to  informative  inference.   (A  logarithmic  transformation  reduced 
scale-parameter  problems  to  translation-parameter  probleras.) 

The  usual  general  approach  to  point-and  confidence-interval 
estimation,  for  example  as  fonaulated  and  developed  in  [2],   takes 
the  given  experiment  E  as  the  basic  frame  of  reference  for  inference 
methods  and  statements.   Examples  1,  2,  I).,  5,  7,  and  8  of  [2]  involve 
translation  or  scale  parameters,  and  in  all  but  the  first  two 
examples   the  confidence  methods  given  there  differ  markedly  from 
those  developed  above.   For  purposes  of   informative  inference, 
the  formulation  and  methods  of  the  present  paper  seem  preferaole 
in  principle,   for  the  reasoas  given  above. 

The  methods  of  the  preceding  sections  for  translation  and 
scale  pararaeters  have  important  points  of  agreement  and  of  difference 
with  those  given  by  Pitman  [3]. 

The  raethods  of  the  preceding  sections  for  translation  parameters 
admit  immediate  generalization  to  multiparameter  problems,  such  as 
a  p.d.f,  g(y^-£5,y2-«)  of  a  random  vector  Y  =  (Y^,Y2).  Analogous 
methods  apply  to  rotation  paraineters  such  as  the  case  Y  =  (R,Z), 
with  p.d.f,   g(r,z-©),  0  g  r  <  00 ,  0  g  z-^  <  Ztt,   where  (r,z)  are 
the  polar  coordinates  of  a  point  y  in  the  plane,  A  specific  example 
is  the  case  of  a  bivariate  normal  distribution,  with  identity  co- 
variance  matrix  and  with  mean  lying  on  the  XAnit  circle. 
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